PDF malicious file inspecttions steps

Step 1 : Initial Inspection

File Metadata: Check metadata such as author, creation date, and software used to create the PDF (/Author, /CreationDate, /Producer) for anomalies or suspicious entries.

File Size and Hash: Note the file size and compute hashes (MD5, SHA-256) for integrity verification.

Step 2: Static Analysis

During static analysis, we will not run the PDF file. Instead, we will systematically examine the document structure, embedded content, JavaScript, and network interactions. This includes:

Document Structure Analysis: Examine all objects present in a PDF file and associated entries or PDF keywords that start with "/".
- Object Streams: Extract object streams (/Type /ObjStm). If stream objects are present, decode the contents of the object.
- Suspicious Keywords: Check for the following PDF keywords or object entries that are abused by adversaries to hide or execute malicious code. Note that the list is not exhaustive.
  - /OpenAction
  - /AA
  - /JavaScript
  - /JS
  - /AcroForm
  - /XFA
  - /URI
  - /RichMedia
  - /ObjStm
  - /EmbededFile
- Content Extraction: Extract readable text and embedded images for analysis.
- JavaScript Analysis: Identify and analyze embedded JavaScript for malicious activities. These scripts can be included for various reasons, such as:
  - Document manipulation (e.g., redirecting to malicious sites).
  - Exploiting vulnerabilities in PDF readers.
  - Triggering actions without user interaction (e.g., launching executables).

Step 3: Dynamic Analysis

If required, we can also perform the dynamic analysis in a sandbox environment as we learned in the "Introduction to Malware Analysis" module. This includes opening the PDF file and monitoring the actions it performs on the system.

Sandbox Execution: Open the PDF in a secure, isolated environment or a sandbox to observe its behavior:
- Monitor system calls, registry changes, process creation/termination and file system modifications.
- Capture network traffic to detect communication with malicious domains or IP addresses.

trid.exe when we don't know about a file type, we can extract some basic information about the sample using trid.exe.

analyze a pdf with peedf.py

We can use olemeta.py, which is a script to parse OLE files such as MS Office documents (e.g., Word, Excel). This script extracts all standard properties present in the OLE file

we can use oleid.py to get more information related to the sample. This is a script to analyze OLE files, such as MS Office documents (e.g., Word, Excel), to detect specific characteristics usually found in malicious files (e.g., malware). For example, it can detect VBA macros and embedded Flash objects.

olevba utility. This script is used to open a MS Office file, detect if it contains VBA macros, and extract and analyze the VBA source code from your own Python applications.

Zipdump has an option to dump all content of the file using the --dumpall parameter. This is really important as we can search through it.

XLMMacroDeobfuscator can be used to decode obfuscated XLM macros (also known as Excel 4.0 macros). It utilizes an internal XLM emulator to interpret the macros, without fully executing the code.

To install the latest development of XLMMacroDeobfuscator, we can use the command below:

Code: python

pip install -U https://github.com/DissectMalware/XLMMacroDeobfuscator/archive/master.zip --force